Covariance Estimation: The GLM and Regularization Perspectives
نویسندگان
چکیده
Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from two relatively complementary perspectives: (1) generalized linear models (GLM) or parsimony and use of covariates in low dimensions, and (2) regularization or sparsity for high-dimensional data. An emerging, unifying and powerful trend in both perspectives is that of reducing a covariance estimation problem to that of estimating a sequence of regression problems. We point out several instances of the regression-based formulation. A notable case is in sparse estimation of a precision matrix or a Gaussian graphical model leading to the fast graphical LASSO algorithm. Some advantages and limitations of the regression-based Cholesky decomposition relative to the classical spectral (eigenvalue) and variance-correlation decompositions are highlighted. The former provides an unconstrained and statistically interpretable reparameterization, and guarantees the positive-definiteness of the estimated covariance matrix. It reduces the unintuitive task of covariance estimation to that of modeling a sequence of regressions at the cost of imposing an a priori order among the variables. Elementwise regularization of the sample covariance matrix such as banding, tapering and thresholding has desirable asymptotic properties and the sparse estimated covariance matrix is positive definite with probability tending to one for large samples and dimensions.
منابع مشابه
CS 545 : Assignment 8 Dan
Mixture of Probabilistic Principal Component Analyzers (MPPCA) is a seminal work in Machine Learning in that it was the first to use PCA to perform clustering and local dimensionality reduction. MPPCA is based upon the mixture of Factor Analyzers (MFA) which is similar to MPPCA except is uses Factor Analysis to estimate the covariance matrix. This algorithm is of interest to me because it is re...
متن کاملMulti-regularization Parameters Estimation for Gaussian Mixture Classifier based on MDL Principle
Regularization is a solution to solve the problem of unstable estimation of covariance matrix with a small sample set in Gaussian classifier. And multi-regularization parameters estimation is more difficult than single parameter estimation. In this paper, KLIM_L covariance matrix estimation is derived theoretically based on MDL (minimum description length) principle for the small sample problem...
متن کاملSimultaneous Modelling of Covariance Matrices: GLM, Bayesian and Nonparametric Perspectives
We provide a brief survey of the progress made in modelling covariance matrices from the perspective of generalized linear models (GLM) and the use of link functions (factorizations) that may lead to statistically meaningful and unconstrained reparameterization. We highlight the advantage of the Cholesky decomposition in dealing with the normal likelihood maximization and compare the findings w...
متن کاملSpatially Structured Sparse Morphological Component Separation for voltage-sensitive dye optical imaging.
BACKGROUND Voltage-sensitive dye optical imaging is a promising technique for studying in vivo neural assemblies dynamics where functional clustering can be visualized in the imaging plane. Its practical potential is however limited by many artifacts. NEW METHOD We present a novel method, that we call "SMCS" (Spatially Structured Sparse Morphological Component Separation), to separate the rel...
متن کاملRegularized MMSE multiuser detection using covariance matrix tapering
The linear minimum mean-squared error (MMSE) detector for direct-sequence code-division multiple-access (DSCDMA) systems relies on the inverse of the covariance matrix of the received signal. In multiuser environments, when few samples are available for the covariance estimation, the matrix illconditioning may produce large performance degradation. In order to cope with this effect, we propose ...
متن کامل